Main Materials
Create a table of differential expression results
We have a list of significantly differentially expressed genes, but the only annotation we can see is the Entrez Gene ID, which is not very informative.
results <- as.data.frame(topTags(lrt_BvsL, n = Inf))
head(results)
logFC logCPM LR PValue FDR
110308 -8.940579 10.264297 24.89789 6.044844e-07 0.0004377961
50916 -8.636503 5.749781 24.80037 6.358512e-07 0.0004377961
12293 -8.362247 6.794788 24.68526 6.749827e-07 0.0004377961
56069 -8.419433 6.124377 24.41532 7.764861e-07 0.0004377961
24117 -9.290691 6.757163 24.32506 8.137331e-07 0.0004377961
12818 -8.216790 8.172247 24.24233 8.494462e-07 0.0004377961
Visualise results with a ‘Smear plot’
edgeR provides a function plotSmear that allows us to visualise the results of a DE analysis. This plot shows the log-fold change against log-counts per million, with DE genes highlighted:
# identify differentially expressed genes
de <- decideTestsDGE(lrt_BvsL)
summary(de)
CellTypeluminal
Down 2957
NotSig 11232
Up 1615
# create a vector of differentially expressed genes
detags <- rownames(dgeObj)[as.logical(de)]
# plot smear highlighting DE genes
plotSmear(lrt_BvsL, de.tags = detags)
However, on such a plot it would be nice to add labels to highlight the genes with most evidence for being DE, or our favourite genes. To perform such a task we need to map between the identifiers we have in the edgeR output and more familiar names.
Finally, we will look at sophisticated visualisations that allow us to incorporate information about the structure of a gene, level of sequencing coverage.
Adding annotation to the edgeR results
There are a number of ways to add annotation, but we will demonstrate how to do this using the org.Mm.eg.db package. This package is one of several organism-level packages which are re-built every 6 months. These packages are listed on the annotation section of the Bioconductor, and are installed in the same way as regular Bioconductor packages. An alternative approach is to use biomaRt, an interface to the BioMart resource. BioMart is much more comprehensive, but the organism packages fit better into the Bioconductor workflow.
# source("http://www.bioconductor.org/biocLite.R")
# # for mouse
# biocLite("org.Mm.eg.db")
# #For Human
# biocLite("org.Hs.eg.db")
The packages are larger in size that Bioconductor software pacakges, but essentially they are databases that can be used to make offline queries.
library(org.Mm.eg.db)
First we need to decide what information we want. In order to see what we can extract we can run the columns function on the annotation database.
columns(org.Mm.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME"
[8] "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO" "GOALL" "IPI" "MGI"
[15] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ"
[22] "SYMBOL" "UNIGENE" "UNIPROT"
We are going to filter the database by a key or set of keys in order to extract the information we want. Valid names for the key can be retrieved with the keytypes function.
keytypes(org.Mm.eg.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME"
[8] "EVIDENCE" "EVIDENCEALL" "GENENAME" "GO" "GOALL" "IPI" "MGI"
[15] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID" "PROSITE" "REFSEQ"
[22] "SYMBOL" "UNIGENE" "UNIPROT"
We should see ENTREZID, which is the type of key we are going to use in this case. If we are unsure what values are acceptable for the key, we can check what keys are valid with keys
keys(org.Mm.eg.db, keytype="ENTREZID")[1:10]
[1] "11287" "11298" "11302" "11303" "11304" "11305" "11306" "11307" "11308" "11350"
It is a useful sanity check to make sure that the keys you want to use are all valid. We could use %in% in this case.
## Build up the query step-by-step
my.keys <- rownames(results)[1:10]
my.keys %in% keys(org.Mm.eg.db, keytype="ENTREZID")
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
all(my.keys %in% keys(org.Mm.eg.db, keytype="ENTREZID"))
[1] TRUE
To annotate our results, we definitely want gene symbols and perhaps the full gene name. Let’s build up our annotation information in a separate data frame using the select function.
annot <- select(org.Mm.eg.db,
keys=rownames(results),
columns=c("ENTREZID","SYMBOL","GENENAME"))
'select()' returned 1:1 mapping between keys and columns
# Have a look at the annotation
head(annot)
ENTREZID SYMBOL GENENAME
1 110308 Krt5 keratin 5
2 50916 Irx4 Iroquois homeobox 4
3 12293 Cacna2d1 calcium channel, voltage-dependent, alpha2/delta subunit 1
4 56069 Il17b interleukin 17B
5 24117 Wif1 Wnt inhibitory factor 1
6 12818 Col14a1 collagen, type XIV, alpha 1
Let’s double check that the ENTREZID column matches exactly to our results rownames.
table(annot$ENTREZID==rownames(results))
TRUE
15804
We can bind in the annotation information to the results data frame. (Please note that if the select function returns a 1:many mapping then you can’t just append the annotation to the fit object.)
results.annotated <- cbind(results, annot)
head(results.annotated)
logFC logCPM LR PValue FDR ENTREZID SYMBOL
110308 -8.940579 10.264297 24.89789 6.044844e-07 0.0004377961 110308 Krt5
50916 -8.636503 5.749781 24.80037 6.358512e-07 0.0004377961 50916 Irx4
12293 -8.362247 6.794788 24.68526 6.749827e-07 0.0004377961 12293 Cacna2d1
56069 -8.419433 6.124377 24.41532 7.764861e-07 0.0004377961 56069 Il17b
24117 -9.290691 6.757163 24.32506 8.137331e-07 0.0004377961 24117 Wif1
12818 -8.216790 8.172247 24.24233 8.494462e-07 0.0004377961 12818 Col14a1
GENENAME
110308 keratin 5
50916 Iroquois homeobox 4
12293 calcium channel, voltage-dependent, alpha2/delta subunit 1
56069 interleukin 17B
24117 Wnt inhibitory factor 1
12818 collagen, type XIV, alpha 1
We can save the results table using the write.csv function, which writes the results out to a csv file that you can open in excel.
write.csv(results.annotated,
file="B.PregVsLacResults.csv",
row.names=FALSE)
A note about deciding how many genes are significant: In order to decide which genes are differentially expressed, we usually take a cut-off of 0.05 on the adjusted p-value, NOT the raw p-value. This is because we are testing more than 15000 genes, and the chances of finding differentially expressed genes is very high when you do that many tests. Hence we need to control the false discovery rate, which is the adjusted p-value column in the results table. What this means is that if 100 genes are significant at a 5% false discovery rate, we are willing to accept that 5 will be false positives. Note that the decideTests function displays significant genes at 5% FDR.
Challenge
Re-visit the
plotSmearplot from above and use thetextfunction to add labels for the names of the top 20 most DE genes
Further visualisation
Volcano plot
Another common visualisation is the volcano plot which displays a measure of significance on the y-axis and fold-change on the x-axis.
signif <- -log10(results.annotated$FDR)
plot(results.annotated$logFC,
signif,
bg="black",
pch=21,
xlab="log2(Fold Change)")
points(results.annotated[detags,"logFC"],
-log10(results.annotated[detags,"FDR"]),
pch=21,
bg="red")
Strip chart for gene expression
Before following up on the DE genes with further lab work, a recommended sanity check is to have a look at the expression levels of the individual samples for the genes of interest. We can quickly look at grouped expression using stripchart. We can use the normalised log expression values in the dgeCounts object (dgeCounts$counts).
normCounts <- cpm(dgeObj, log = T)
# Let's look at the first gene in the topTable, Krt5, which has a rowname 110308
stripchart(normCounts["110308",]~sampleinfo$Group,
xlab="",
ylab="log2(Counts)",
vertical=TRUE,
las=2,
pch=21,
col=1:6,
cex=2)
Interactive StripChart with Glimma
An interactive version of the volcano plot above that includes the raw per sample values in a separate panel is possible via the glXYPlot function in the Glimma package.
library(Glimma)
group <- sampleinfo$Group
levels(group) <- c("basal.lactate","basal.preg","basal.virgin","lum.lactate", "lum.preg", "lum.virgin")
annot.mod <- annot
colnames(annot.mod)[1] <- "GeneID"
de <- as.numeric(results$FDR<=0.01)
glXYPlot(x=results$logFC, y=-log10(results$FDR),
xlab="logFC", ylab="B", main="B.PregVsLac",
counts=normCounts, groups=group, status=de,
anno=annot.mod, id.column="ENTREZID", folder="volcano")
This function creates an html page (./volcano/XY-Plot.html) with a volcano plot on the left and a plot showing the log-CPM per sample for a selected gene on the right. A search bar is available to search for genes of interest.
Retrieving Genomic Locations
It might seem natural to add genomic locations to our annotation table, and possibly a bit odd that the org.Mm.eg.db package does not supply such mappings. In fact, there is a whole suite of package for performing this, and more-advanced queries that relate to the location of genes. These are listed on the Bioconductor annotation page and have the prefix TxDb.
The package we will be using is TxDb.Mmusculus.UCSC.mm10.knownGene. Packages are available for other organisms and genome builds. It is even possible to build your own database if one does not exist. See vignette("GenomicFeatures") for details
# source("http://www.bioconductor.org/biocLite.R")
## For mouse
# biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")
## For Humans
# biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
We load the library in the usual fashion and create a new object to save some typing. As with the org. packages, we can query what columns are available with columns,
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
tx <- TxDb.Mmusculus.UCSC.mm10.knownGene
columns(tx)
[1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSSTART" "CDSSTRAND" "EXONCHROM" "EXONEND"
[9] "EXONID" "EXONNAME" "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID" "TXCHROM" "TXEND"
[17] "TXID" "TXNAME" "TXSTART" "TXSTRAND" "TXTYPE"
The select function is used in the same manner as the org.Mm.eg.db packages.
Challenge 2
Use the TxDb.Mmusculus.UCSC.mm10.knownGene package to retrieve the exon coordinates for the genes
50916,110308,12293
Overview of GenomicRanges
One of the real strengths of the txdb.. packages is the ability of interface with GenomicRanges, which is the object type used throughout Bioconductor to manipulate Genomic Intervals.
These object types permit us to perform common operations on intervals such as overlapping and counting. We can define the chromosome, start and end position of each region (also strand too, but not shown here).
library(GenomicRanges)
simple_range <- GRanges(seqnames = "1", ranges = IRanges(start=1000, end=2000))
simple_range
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 1000-2000 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
We don’t have to have all our ranges located on the same chromosome
chrs <- c("chr13", "chr15", "chr5")
start <- c(73000000, 101000000, 15000000)
end <- c(74000000, 102000000, 16000000)
my_ranges <- GRanges(seqnames = rep(chrs, 3),
ranges = IRanges(start = rep(start, each = 3),
end = rep(end, each = 3))
)
my_ranges
GRanges object with 9 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr13 73000000-74000000 *
[2] chr15 73000000-74000000 *
[3] chr5 73000000-74000000 *
[4] chr13 101000000-102000000 *
[5] chr15 101000000-102000000 *
[6] chr5 101000000-102000000 *
[7] chr13 15000000-16000000 *
[8] chr15 15000000-16000000 *
[9] chr5 15000000-16000000 *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
There are a number of useful functions for calculating properties of the data (such as coverage or sorting). Not so much for RNA-seq analysis, but GenomicRanges are used throughout Bioconductor for the analysis of NGS data.
For instance, we can quickly identify overlapping regions between two GenomicRanges.
keys <- c("50916", "110308", "12293")
genePos <- select(tx,
keys = keys,
keytype = "GENEID",
columns = c("EXONCHROM", "EXONSTART", "EXONEND")
)
'select()' returned 1:many mapping between keys and columns
geneRanges <- GRanges(genePos$EXONCHROM,
ranges = IRanges(genePos$EXONSTART, genePos$EXONEND),
GENEID = genePos$GENEID)
geneRanges
GRanges object with 58 ranges and 1 metadata column:
seqnames ranges strand | GENEID
<Rle> <IRanges> <Rle> | <character>
[1] chr13 73260497-73260876 * | 50916
[2] chr13 73264848-73264979 * | 50916
[3] chr13 73265458-73265709 * | 50916
[4] chr13 73266596-73266708 * | 50916
[5] chr13 73267504-73267832 * | 50916
... ... ... ... . ...
[54] chr5 16370558-16374511 * | 12293
[55] chr5 16341990-16342010 * | 12293
[56] chr5 16326327-16326383 * | 12293
[57] chr5 16322539-16322595 * | 12293
[58] chr5 16267376-16268604 * | 12293
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
findOverlaps(my_ranges, geneRanges)
Hits object with 16 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 1 3
[4] 1 4
[5] 1 5
... ... ...
[12] 5 12
[13] 5 13
[14] 5 14
[15] 5 15
[16] 9 16
-------
queryLength: 9 / subjectLength: 58
However, we have to pay attention to the naming convention used for each object. seqlevelsStyle can help.
seqlevelsStyle(geneRanges)
[1] "UCSC"
seqlevelsStyle(my_ranges)
[1] "UCSC"
seqlevelsStyle(simple_range)
[1] "NCBI" "Ensembl" "MSU6" "AGPvF"
Exporting tracks
It is also possible to save the results of a Bioconductor analysis in a browser to enable interactive analysis and integration with other data types, or sharing with collaborators. For instance, we might want a browser track to indicate where our differentially-expressed genes are located. We shall use the bed format to display these locations. We will annotate the ranges with information from our analysis such as the fold-change and significance.
First we create a data frame for just the DE genes.
sigGenes <- results.annotated[detags,]
message("Number of significantly DE genes: ", nrow(sigGenes))
Number of significantly DE genes: 4572
head(sigGenes)
logFC logCPM LR PValue FDR ENTREZID SYMBOL
497097 -10.947240 2.5236515 23.590694 1.191624e-06 0.0004377961 497097 Xkr4
20671 -2.673131 1.2418640 10.587241 1.138708e-03 0.0078553551 20671 Sox17
58175 4.471434 1.1240115 14.373441 1.499018e-04 0.0020196485 58175 Rgs20
76187 3.033003 2.4071013 8.475630 3.599356e-03 0.0182496716 76187 Adhfe1
72481 2.136618 -0.2472752 6.137581 1.323382e-02 0.0468231045 72481 2610203C22Rik
329093 -9.262471 0.7951431 22.116319 2.566190e-06 0.0004377961 329093 Cpa6
GENENAME
497097 X-linked Kx blood group related 4
20671 SRY (sex determining region Y)-box 17
58175 regulator of G-protein signaling 20
76187 alcohol dehydrogenase, iron containing, 1
72481 RIKEN cDNA 2610203C22 gene
329093 carboxypeptidase A6
Create a genomic ranges object
Several convenience functions exist to retrieve the structure of every gene from a given TxDb object in one list. The output of exonsBy is a list, where each item in the list is the exon co-ordinates of a particular gene, however, we do not need this level of granularity for the bed output, so we will collapse to a single region for each gene.
First we use the range function to obtain a single range for every gene and tranform to a more convenient object with unlist.
exo <- exonsBy(tx, "gene")
exoRanges <- unlist(range(exo))
sigRegions <- exoRanges[na.omit(match(sigGenes$ENTREZID, names(exoRanges)))]
sigRegions
GRanges object with 4393 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
497097 chr1 3214482-3671498 -
20671 chr1 4490928-4497354 -
58175 chr1 4909576-5070285 -
76187 chr1 9548046-9577968 +
72481 chr1 9560833-9631092 -
... ... ... ...
195727 chrX 161836430-162159441 -
108012 chrX 163909017-163933666 +
56078 chrX 163976822-164028010 -
54156 chrX 166523007-166585716 -
333605 chrX 167471306-168577233 -
-------
seqinfo: 66 sequences (1 circular) from mm10 genome
For visualisation purposes, we are going to restrict the data to genes that are located on chromosomes 1 to 19 and the sex chromosomes. This can be done with the keepSeqLevels function.
seqlevels(sigRegions)
[1] "chr1" "chr2" "chr3" "chr4"
[5] "chr5" "chr6" "chr7" "chr8"
[9] "chr9" "chr10" "chr11" "chr12"
[13] "chr13" "chr14" "chr15" "chr16"
[17] "chr17" "chr18" "chr19" "chrX"
[21] "chrY" "chrM" "chr1_GL456210_random" "chr1_GL456211_random"
[25] "chr1_GL456212_random" "chr1_GL456213_random" "chr1_GL456221_random" "chr4_GL456216_random"
[29] "chr4_GL456350_random" "chr4_JH584292_random" "chr4_JH584293_random" "chr4_JH584294_random"
[33] "chr4_JH584295_random" "chr5_GL456354_random" "chr5_JH584296_random" "chr5_JH584297_random"
[37] "chr5_JH584298_random" "chr5_JH584299_random" "chr7_GL456219_random" "chrX_GL456233_random"
[41] "chrY_JH584300_random" "chrY_JH584301_random" "chrY_JH584302_random" "chrY_JH584303_random"
[45] "chrUn_GL456239" "chrUn_GL456359" "chrUn_GL456360" "chrUn_GL456366"
[49] "chrUn_GL456367" "chrUn_GL456368" "chrUn_GL456370" "chrUn_GL456372"
[53] "chrUn_GL456378" "chrUn_GL456379" "chrUn_GL456381" "chrUn_GL456382"
[57] "chrUn_GL456383" "chrUn_GL456385" "chrUn_GL456387" "chrUn_GL456389"
[61] "chrUn_GL456390" "chrUn_GL456392" "chrUn_GL456393" "chrUn_GL456394"
[65] "chrUn_GL456396" "chrUn_JH584304"
sigRegions <- keepSeqlevels(sigRegions,
value = paste0("chr", c(1:19,"X","Y")),
pruning.mode="tidy")
seqlevels(sigRegions)
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11" "chr12" "chr13"
[14] "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chrX" "chrY"
Add metadata to GRanges object
A useful propery of GenomicRanges is that we can attach metadata to each range using the mcols function. The metadata can be supplied in the form of a data frame.
mcols(sigRegions) <- sigGenes[match(names(sigRegions), rownames(sigGenes)), ]
sigRegions
GRanges object with 4392 ranges and 8 metadata columns:
seqnames ranges strand | logFC logCPM LR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
497097 chr1 3214482-3671498 - | -10.9472399580736 2.5236514829963 23.590694348389
20671 chr1 4490928-4497354 - | -2.67313117621058 1.24186399534385 10.5872405573132
58175 chr1 4909576-5070285 - | 4.47143440821072 1.12401146729236 14.3734409590833
76187 chr1 9548046-9577968 + | 3.0330034607647 2.40710126232655 8.47563041277367
72481 chr1 9560833-9631092 - | 2.13661774722519 -0.247275168961162 6.13758113260831
... ... ... ... . ... ... ...
195727 chrX 161836430-162159441 - | -4.69261763279776 3.10777653736015 13.9121195008167
108012 chrX 163909017-163933666 + | -2.17699584124451 2.32098302864258 10.2017044061552
56078 chrX 163976822-164028010 - | 2.5883614796138 5.1331378732911 6.92309295315483
54156 chrX 166523007-166585716 - | -6.96434996075043 4.39157591753061 19.4442035074456
333605 chrX 167471306-168577233 - | -3.89607089373906 0.656193636646942 6.91789090661196
PValue FDR ENTREZID SYMBOL
<numeric> <numeric> <character> <character>
497097 1.19162399345158e-06 0.000437796062030272 497097 Xkr4
20671 0.00113870810229744 0.00785535505018016 20671 Sox17
58175 0.000149901775017694 0.00201964846750182 58175 Rgs20
76187 0.00359935626571388 0.0182496716148034 76187 Adhfe1
72481 0.0132338222037952 0.0468231044850456 72481 2610203C22Rik
... ... ... ... ...
195727 0.000191559297917662 0.00234386623585872 195727 Nhs
108012 0.00140310912515326 0.00917069338871881 108012 Ap1s2
56078 0.00850896813252038 0.033890053519746 56078 Car5b
54156 1.03581719496295e-05 0.000520513371686814 54156 Egfl6
333605 0.00853375663282725 0.0339270997902079 333605 Frmpd4
GENENAME
<character>
497097 X-linked Kx blood group related 4
20671 SRY (sex determining region Y)-box 17
58175 regulator of G-protein signaling 20
76187 alcohol dehydrogenase, iron containing, 1
72481 RIKEN cDNA 2610203C22 gene
... ...
195727 NHS actin remodeling regulator
108012 adaptor-related protein complex 1, sigma 2 subunit
56078 carbonic anhydrase 5b, mitochondrial
54156 EGF-like-domain, multiple 6
333605 FERM and PDZ domain containing 4
-------
seqinfo: 21 sequences from mm10 genome
Scores and colour on exported tracks
The .bed file format is commonly used to store genomic locations for display in genome browsers (e.g. the UCSC browser or IGV) as tracks. Rather than just representing the genomic locations, the .bed format is also able to colour each range according to some property of the analysis (e.g. direction and magnitude of change) to help highlight particular regions of interest. A score can also be displayed when a particular region is clicked-on.
For the score we can use the \(-log_{10}\) of the adjusted p-value and colour scheme for the regions based on the fold-change
colorRampPalette is a useful function in base R for constructing a palette between two extremes. When choosing colour palettes, make sure they are colour blind friendly. The red / green colour scheme traditionally-applied to microarrays is a bad choice.
We will also truncate the fold-changes to between -5 and 5 to and divide this range into 10 equal bins
rbPal <- colorRampPalette(c("red", "blue"))
logfc <- pmax(sigRegions$logFC, -5)
logfc <- pmin(logfc , 5)
Cols <- rbPal(10)[as.numeric(cut(logfc, breaks = 10))]
The colours and score have to be saved in the GRanges object as score and itemRgb columns respectively, and will be used to construct the browser track. The rtracklayer package can be used to import and export browsers tracks.
Now we can export the signifcant results from the DE analysis as a .bed track using rtracklayer. You can load the resulting file in IGV, if you wish.
mcols(sigRegions)$score <- -log10(sigRegions$FDR)
mcols(sigRegions)$itemRgb <- Cols
sigRegions
GRanges object with 4392 ranges and 10 metadata columns:
seqnames ranges strand | logFC logCPM LR
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
497097 chr1 3214482-3671498 - | -10.9472399580736 2.5236514829963 23.590694348389
20671 chr1 4490928-4497354 - | -2.67313117621058 1.24186399534385 10.5872405573132
58175 chr1 4909576-5070285 - | 4.47143440821072 1.12401146729236 14.3734409590833
76187 chr1 9548046-9577968 + | 3.0330034607647 2.40710126232655 8.47563041277367
72481 chr1 9560833-9631092 - | 2.13661774722519 -0.247275168961162 6.13758113260831
... ... ... ... . ... ... ...
195727 chrX 161836430-162159441 - | -4.69261763279776 3.10777653736015 13.9121195008167
108012 chrX 163909017-163933666 + | -2.17699584124451 2.32098302864258 10.2017044061552
56078 chrX 163976822-164028010 - | 2.5883614796138 5.1331378732911 6.92309295315483
54156 chrX 166523007-166585716 - | -6.96434996075043 4.39157591753061 19.4442035074456
333605 chrX 167471306-168577233 - | -3.89607089373906 0.656193636646942 6.91789090661196
PValue FDR ENTREZID SYMBOL
<numeric> <numeric> <character> <character>
497097 1.19162399345158e-06 0.000437796062030272 497097 Xkr4
20671 0.00113870810229744 0.00785535505018016 20671 Sox17
58175 0.000149901775017694 0.00201964846750182 58175 Rgs20
76187 0.00359935626571388 0.0182496716148034 76187 Adhfe1
72481 0.0132338222037952 0.0468231044850456 72481 2610203C22Rik
... ... ... ... ...
195727 0.000191559297917662 0.00234386623585872 195727 Nhs
108012 0.00140310912515326 0.00917069338871881 108012 Ap1s2
56078 0.00850896813252038 0.033890053519746 56078 Car5b
54156 1.03581719496295e-05 0.000520513371686814 54156 Egfl6
333605 0.00853375663282725 0.0339270997902079 333605 Frmpd4
GENENAME score itemRgb
<character> <numeric> <character>
497097 X-linked Kx blood group related 4 3.35872814922358 #FF0000
20671 SRY (sex determining region Y)-box 17 2.10483418072595 #C60038
58175 regulator of G-protein signaling 20 2.69472421565727 #0000FF
76187 alcohol dehydrogenase, iron containing, 1 1.73874494584652 #1C00E2
72481 RIKEN cDNA 2610203C22 gene 1.3295397949105 #3800C6
... ... ... ...
195727 NHS actin remodeling regulator 2.63006717707634 #FF0000
108012 adaptor-related protein complex 1, sigma 2 subunit 2.0375978264336 #C60038
56078 carbonic anhydrase 5b, mitochondrial 1.46992774531689 #3800C6
54156 EGF-like-domain, multiple 6 3.28356810923663 #FF0000
333605 FERM and PDZ domain containing 4 1.46945326381326 #E2001C
-------
seqinfo: 21 sequences from mm10 genome
library(rtracklayer)
export(sigRegions , con = "topHits.bed")